PALO bounds for reinforcement learning in partially observable stochastic games

نویسندگان

چکیده

We introduce reinforcement learning for heterogeneous teams in which rewards an agent are additively factored into local costs, stimuli unique to each agent, and global rewards, those shared by all agents the domain. Motivating domains include coordination of varied robotic platforms, incur different costs same action, but share overall goal. present two templates this setting with rewards: a generalization Perkins' Monte Carlo exploring starts POMDPs canonical MPOMDPs, single policy mapping joint observations actions (MCES-MP); another individually their own action (MCES-FMP). use probably approximately optimal (PALO) bounds analyze sample complexity, instantiating these PALO learning. promote efficiency including space pruning technique, evaluate approaches on three demonstrating that MCES-FMP yields improved policies less samples compared MCES-MP previous benchmark.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-task Reinforcement Learning in Partially Observable Stochastic Environments

We consider the problem of multi-task reinforcement learning (MTRL) in multiple partially observable stochastic environments. We introduce the regionalized policy representation (RPR) to characterize the agent’s behavior in each environment. The RPR is a parametric model of the conditional distribution over current actions given the history of past actions and observations; the agent’s choice o...

متن کامل

Dynamic Programming for Partially Observable Stochastic Games

We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterative elimination of dominated strategies in normal form games. We prove that it iteratively eliminates very weakly dominated strategies without first forming the normal form r...

متن کامل

Inverse Reinforcement Learning in Partially Observable Environments

Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behaviour of an expert. Most of the existing algorithms for IRL assume that the expert’s environment is modeled as a Markov decision process (MDP), although they should be able to handle partially observable settings in order to widen the applicability to more realistic scenarios. In this p...

متن کامل

Nonparametric Bayesian Approaches for Reinforcement Learning in Partially Observable Domains

The objective of my doctoral research is bring together two fields: partially-observable reinforcement learning (PORL) and non-parametric Bayesian statistics (NPB) to address issues of statistical modeling and decisionmaking in complex, realworld domains.

متن کامل

Bayesian Nonparametric Approaches for Reinforcement Learning in Partially Observable Domains

Making intelligent decisions from incomplete information is critical in many applications: for example, medical decisions must often be made based on a few vital signs, without full knowledge of a patient’s condition, and speech-based interfaces must infer a user’s needs from noisy microphone inputs. What makes these tasks hard is that we do not even have a natural representation with which to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neurocomputing

سال: 2021

ISSN: ['0925-2312', '1872-8286']

DOI: https://doi.org/10.1016/j.neucom.2020.08.054